relevance label
Semi-Supervised Synthetic Data Generation with Fine-Grained Relevance Control for Short Video Search Relevance Modeling
Li, Haoran, Su, Zhiming, Yao, Junyan, Zhang, Enwei, Ji, Yang, Chen, Yan, Zhou, Kan, Feng, Chao, Ran, Jiao
Synthetic data is widely adopted in embedding models to ensure diversity in training data distributions across dimensions such as difficulty, length, and language. However, existing prompt-based synthesis methods struggle to capture domain-specific data distributions, particularly in data-scarce domains, and often overlook fine-grained relevance diversity. In this paper, we present a Chinese short video dataset with 4-level relevance annotations, filling a critical resource void. Further, we propose a semi-supervised synthetic data pipeline where two collaboratively trained models generate domain-adaptive short video data with controllable relevance labels. Our method enhances relevance-level diversity by synthesizing samples for underrepresented intermediate relevance labels, resulting in a more balanced and semantically rich training data set. Extensive offline experiments show that the embedding model trained on our synthesized data outperforms those using data generated based on prompting or vanilla supervised fine-tuning(SFT). Moreover, we demonstrate that incorporating more diverse fine-grained relevance levels in training data enhances the model's sensitivity to subtle semantic distinctions, highlighting the value of fine-grained relevance supervision in embedding learning. In the search enhanced recommendation pipeline of Douyin's dual-column scenario, through online A/B testing, the proposed model increased click-through rate(CTR) by 1.45%, raised the proportion of Strong Relevance Ratio (SRR) by 4.9%, and improved the Image User Penetration Rate (IUPR) by 0.1054%.
Enhancing Transformer-Based Rerankers with Synthetic Data and LLM-Based Supervision
Peshevski, Dimitar, Blazhevski, Kiril, Popovski, Martin, Madjarov, Gjorgji
Effective document reranking is essential for improving search relevance across diverse applications. While Large Language Models (LLMs) excel at reranking due to their deep semantic understanding and reasoning, their high computational cost makes them impractical for many real-world deployments. Fine-tuning smaller, task-specific models is a more efficient alternative but typically depends on scarce, manually labeled data. To overcome this, we propose a novel pipeline that eliminates the need for human-labeled query-document pairs. Our method uses LLMs to generate synthetic queries from domain-specific corpora and employs an LLM-based classifier to label positive and hard-negative pairs. This synthetic dataset is then used to fine-tune a smaller transformer model with contrastive learning using Localized Contrastive Estimation (LCE) loss. Experiments on the MedQuAD dataset show that our approach significantly boosts in-domain performance and generalizes well to out-of-domain tasks. By using LLMs for data generation and supervision rather than inference, we reduce computational costs while maintaining strong reranking capabilities.
- Europe > North Macedonia > Skopje Statistical Region > Skopje Municipality > Skopje (0.04)
- Asia > China (0.04)
- Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
CSRM-LLM: Embracing Multilingual LLMs for Cold-Start Relevance Matching in Emerging E-commerce Markets
Wang, Yujing, Chen, Yiren, Li, Huoran, Xu, Chunxu, Luo, Yuchong, Mao, Xianghui, Li, Cong, Du, Lun, Ma, Chunyang, Jiang, Qiqi, Wang, Yin, Gao, Fan, Mo, Wenting, Wen, Pei, Kumar, Shantanu, Park, Taejin, Song, Yiwei, Rajaram, Vijay, Cheng, Tao, Durgia, Sonu, Kolari, Pranam
As global e-commerce platforms continue to expand, companies are entering new markets where they encounter cold-start challenges due to limited human labels and user behaviors. In this paper, we share our experiences in Coupang to provide a competitive cold-start performance of relevance matching for emerging e-commerce markets. Specifically, we present a Cold-Start Relevance Matching (CSRM) framework, utilizing a multilingual Large Language Model (LLM) to address three challenges: (1) activating cross-lingual transfer learning abilities of LLMs through machine translation tasks; (2) enhancing query understanding and incorporating e-commerce knowledge by retrieval-based query augmentation; (3) mitigating the impact of training label errors through a multi-round self-distillation training strategy. Our experiments demonstrate the effectiveness of CSRM-LLM and the proposed techniques, resulting in successful real-world deployment and significant online gains, with a 45.8% reduction in defect ratio and a 0.866% uplift in session purchase rate.
- Asia > China > Beijing > Beijing (0.05)
- North America > United States > California > Santa Clara County > Mountain View (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (2 more...)
Personalized Product Search Ranking: A Multi-Task Learning Approach with Tabular and Non-Tabular Data
Morishetti, Lalitesh, Kumar, Abhay, Scott, Jonathan, Nag, Kaushiki, Sharma, Gunjan, Vashishtha, Shanu, Sridhar, Rahul, Chatter, Rohit, Achan, Kannan
In this paper, we present a novel model architecture for optimizing personalized product search ranking using a multi-task learning (MTL) framework. Our approach uniquely integrates tabular and non-tabular data, leveraging a pre-trained TinyBERT model for semantic embeddings and a novel sampling technique to capture diverse customer behaviors. We evaluate our model against several baselines, including XGBoost, TabNet, FT-Transformer, DCN-V2, and MMoE, focusing on their ability to handle mixed data types and optimize personalized ranking. Additionally, we propose a scalable relevance labeling mechanism based on click-through rates, click positions, and semantic similarity, offering an alternative to traditional human-annotated labels. Experimental results show that combining non-tabular data with advanced embedding techniques in multi-task learning paradigm significantly enhances model performance. Ablation studies further underscore the benefits of incorporating relevance labels, fine-tuning TinyBERT layers, and TinyBERT query-product embedding interactions. These results demonstrate the effectiveness of our approach in achieving improved personalized product search ranking.
- North America > United States > California > Santa Clara County > Sunnyvale (0.04)
- Asia > Middle East > Israel > Haifa District > Haifa (0.04)
- Asia > India > Karnataka > Bengaluru (0.04)
FinCPRG: A Bidirectional Generation Pipeline for Hierarchical Queries and Rich Relevance in Financial Chinese Passage Retrieval
Xu, Xuan, Chu, Beilin, Lin, Qinhong, Zhong, Yixiao, Wen, Fufang, Liu, Jiaqi, Fei, Binjie, Li, Yu, Yang, Zhongliang, Zhou, Linna
In recent years, large language models (LLMs) have demonstrated significant potential in constructing passage retrieval datasets. However, existing methods still face limitations in expressing cross-doc query needs and controlling annotation quality. To address these issues, this paper proposes a bidirectional generation pipeline, which aims to generate 3-level hierarchical queries for both intra-doc and cross-doc scenarios and mine additional relevance labels on top of direct mapping annotation. The pipeline introduces two query generation methods: bottom-up from single-doc text and top-down from multi-doc titles. The bottom-up method uses LLMs to disassemble and generate structured queries at both sentence-level and passage-level simultaneously from intra-doc passages. The top-down approach incorporates three key financial elements--industry, topic, and time--to divide report titles into clusters and prompts LLMs to generate topic-level queries from each cluster. For relevance annotation, our pipeline not only relies on direct mapping annotation from the generation relationship but also implements an indirect positives mining method to enrich the relevant query-passage pairs. Using this pipeline, we constructed a Financial Passage Retrieval Generated dataset (FinCPRG) from almost 1.3k Chinese financial research reports, which includes hierarchical queries and rich relevance labels. Through evaluations of mined relevance labels, bench-marking and training experiments, we assessed the quality of FinCPRG and validated its effectiveness as a passage retrieval dataset for both training and benchmarking.
- Asia > China > Beijing > Beijing (0.05)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (6 more...)
When LLMs Disagree: Diagnosing Relevance Filtering Bias and Retrieval Divergence in SDG Search
Ingram, William A., Banerjee, Bipasha, Fox, Edward A.
Large language models (LLMs) are increasingly used to assign document relevance labels in information retrieval pipelines, especially in domains lacking human-labeled data. However, different models often disagree on borderline cases, raising concerns about how such disagreement affects downstream retrieval. This study examines labeling disagreement between two open-weight LLMs, LLaMA and Qwen, on a corpus of scholarly abstracts related to Sustainable Development Goals (SDGs) 1, 3, and 7. We isolate disagreement subsets and examine their lexical properties, rank-order behavior, and classification predictability. Our results show that model disagreement is systematic, not random: disagreement cases exhibit consistent lexical patterns, produce divergent top-ranked outputs under shared scoring functions, and are distinguishable with AUCs above 0.74 using simple classifiers. These findings suggest that LLM-based filtering introduces structured variability in document retrieval, even under controlled prompting and shared ranking logic. We propose using classification disagreement as an object of analysis in retrieval evaluation, particularly in policy-relevant or thematic search tasks.
- Europe > Italy (0.05)
- North America > United States > New York > New York County > New York City (0.05)
- North America > United States > Virginia > Montgomery County > Blacksburg (0.04)
- (4 more...)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Energy > Energy Storage (0.68)
- Electrical Industrial Apparatus (0.68)